Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

io/ompio: change the default value of mca parameter #2424

Merged

Conversation

edgargabriel
Copy link
Member

change the default value of the mca_io_ompio_cycle_buffer_size parameter in order to avoid accidental truncation of a file for very large individual operations.
Thanks to @cniethammer for reporting it.

Signed-off-by: Edgar Gabriel egabriel@central.uh.edu

change the default value of the mca_io_ompio_cycle_buffer_size parameter in order to avoid accidental truncation of a file for very large individual operations.
Thanks to @cniethammer for reporting it.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
@edgargabriel
Copy link
Member Author

@cniethammer would you mind reviewing this change, since you reported the problem and successfully ran the application after applying this change?

Copy link
Contributor

@cniethammer cniethammer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This prevents the issue in first place.

But the logic for the automatic buffer size selection (to be max_data) in ompi/mca/io/ompio/io_ompio_file_write.c and ompi/mca/io/ompio/io_ompio_file_read.c should be either improved to something like
bytes_per_cycle = (max_data < WRITE_SIZE_LIMIT) ?max_data : WRITE_SIZE_LIMIT;
where WRITE_SIZE_LIMIT should be platform dependent (on Linux 2,147,479,552)
or removed as it still checks for the value -1 and can trigger the issue when the user wants to use "auto" mode.

@edgargabriel
Copy link
Member Author

That's a good point, but I would ask maybe @jsquyres to chime in on that: to the best of knowledge, the Open MPI philosophy with MCA parameters was always to follow the request of the user, even if this leads to an error. In this particular situation, you are absolutely right that if the user would disable the cycle_buffer_size feature by setting it to -1, his code might fail. However, after this merge, this would only happen if he explicitly requested that behavior.

Your suggestion would basically overwrite the request of the user.

A second more practical difficulty is, that there is to the best of my knowledge no predefined constant in header files that we could just use. There is RLIMIT_FSIZE which indicates the maximum file size overall, the posix write manpages talks about SSIZE_MAX, both of which are however way too large. So this would require developing the entire configure logic etc. to make it portable. I can do that if the community decides that this is the way to go, but in that case this fix might come too late for the 2.0.2 release.

@jsquyres
Copy link
Member

jsquyres commented Dec 2, 2016

We talked about this in email. The decision was:

  • This PR is a good fix for the initial problem. It seems to give good enough performance and fix the bug.
  • That being said, the 512MB buffer size has not been fully evaluated. We need to keep working on a proper solution for the long term -- that will likely involve either fully-vetting a fixed size for the default, and/or evaluating using "-1" as the default and having appropriate loops to ensure completion of blocking and non-blocking writes.

In short: more work still needs to be done -- this is a short-term fix while this more detailed work is ongoing.

@jsquyres jsquyres dismissed cniethammer’s stale review December 2, 2016 18:56

Per #2424 (comment), allowing this PR to go through.

@jsquyres jsquyres merged commit 065d0fc into open-mpi:v2.0.x Dec 2, 2016
@edgargabriel edgargabriel deleted the pr/v2.0.x-cycle-buf-default-val branch July 17, 2017 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants